Using Information Extraction to Aid the Discovery of Prediction Rules from Text
نویسندگان
چکیده
ABSTRACT Text mining and Information Extra tion (IE) are both topi s of signi ant re ent interest. Text mining on erns applying data mining, a.k.a. knowledge dis overy from databases (KDD) te hniques to unstru tured text. Information extra tion (IE) is a form of shallow text understanding that lo ates spe i pie es of data in natural language do uments, transforming unstru tured text into a stru tured database. This paper des ribes a system alled Dis oTEX, that ombines IE and KDD methods to perform a text mining task, disovering predi tion rules from natural-language orpora. An initial version of Dis oTEX is onstru ted by integrating an IE module based on Rapier and a rule-learning module, Ripper. We present en ouraging results on applying these te hniques to a orpus of omputer job postings from an Internet newsgroup.
منابع مشابه
روش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملINDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...
متن کاملINDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...
متن کاملKnowledge Extraction from the Neural ‘Black Box’ in Ecological Monitoring
Phytoplankton biomass within the Saginaw Bay ecosystem (Lake Huron, Michigan, USA) was characterized as a function of select physical/chemical indicators. The complexity and variability of ecological systems typically make it difficult to model the influences of anthropogenic stressors and/or natural disturbances. Here, Artificial Neural Networks (ANNs) were developed to model chlorophyll a con...
متن کاملText Mining with Information Extraction
The popularity of the Web and the large number of documents available in electronic form has motivated the search for hidden knowledge in text collections. Consequently, there is growing research interest in the general topic of text mining. In this paper, we develop a text-mining system by integrating methods from Information Extraction (IE) and Data Mining (Knowledge Discovery from Databases ...
متن کامل